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Abstract:  DOD  acquisition  programs  have  recognized  that  operating  and  support  costs  dominate 
the  total  life  cycle  costs  of  complex  military  systems,  and  therefore  should  be  considered  up  front 
in  the  design  process.  In  order  to  estimate  operating  costs,  which  are  predominately  related  to 
maintenance  costs,  a 'view'  of  the  conceptual  design  must  exist  that  can  be  used  to  evaluate  the 
effects  of  system  design  variables  upon  maintenance  requirements.  This  view  is  currently  best 
embodied  in  the  Failure  Modes,  Effects,  and  Criticality  Analyses  (FMECA). 

Additionally,  many  DOD  acquisition  programs  are  interested  in  designing  health  management 
systems  through  die  optimal  application  of  system  diagnostic  and  prognostic  techniques  to 
produce  substantial  safety  and  life  cycle  cost  benefits.  To  achieve  these  benefits,  a more 
systematic  and  accurate  method  to  evaluate  candidate  health  monitoring  approaches  during  the 
design  process  must  be  incorporated.  While  the  FMECA  is  a keystone  of  the  maintenance 
planning  process,  it  has  limitations  in  estimating  the  impact  of  Condition-Based  Maintenance 
(CBM)  implementation  on  life  cycle  costs.  CBM  technology  deals  not  just  with  failures,  but  also 
with  monitoring  the  progression  towards  failure  through  detection,  diagnosis,  and  prognosis.  If 
we  are  to  evaluate  maintenance  efforts  and  diagnostic/prognostic  technology  design  choices,  then 
the  failure  modes  must  be  defined  in  a way  that  deals  with  incipient  and  evolving  failures. 
Hence,  the  current  paper  discusses  the  development  of  a tool  called  FMECA++°  for  use  by 
designers  and  end  users  that  addresses  these  issues  and  helps  to  collaboratively  design  the  optimal 
health  management  solutions  for  complex  machinery  from  a cost  benefit  and/or  availability 
standpoint. 

We  discuss  the  processing  concept  of  the  FMECA++0  and  introduce  methods  to  optimize  the 
expanded  failure  mode  analysis,  health  management  metrics,  and  maintainability/availability 
considerations.  A detailed  example  of  a health  management  analysis  is  also  provided. 


Key  Words:  FMECA,  diagnostics,  prognostics,  health  management,  cost/benefit, 
availability 


Introduction:  The  application  of  health  monitoring  systems  serves  to  increase  the  overall 
reliability  of  a system  through  judicious  application  of  intelligent  condition  monitoring 
technologies.  A consistent  health  management  philosophy  integrates  the  results  from  the  health 
monitoring  system  for  the  purposes  of  optimizing  operations  and  maintenance  practices  through, 
1.)  Prediction,  with  confidence  bounds,  of  the  Remaining  Useful  Life  (RUL)  of  critical 
components,  and  2.)  Isolating  the  root  cause  of  failures  after  the  failure  effects  have  been 
observed.  If  RUL  predictions  can  be  made,  the  allocation  of  replacement  parts  or  refurbishment 
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maintenance  logistic  footprints.  Fault  isolation  is  a critical  component  to  maximizing  system 
availability  and  minimizing  downtime  through  more  efficient  troubleshooting  efforts. 

Because  of  its  potential  impact,  health  monitoring  and  management  solutions  should  be 
considered  during  the  initial  design  of  a system.  For  example,  implementing  a health  monitoring 
technology  (defined  here  as  the  combination  of  sensors  and  algorithms)  that  is  capable  of 
detecting  a crack  in  a rotating  part  before  it  gets  to  a critical  size,  may  allow  for  a less 
conservative  factor  of  safety  resulting  in  a cheaper  and  lighter  design  that  would  be  too  risky  if 
health  monitoring  was  not  utilized.  This  link  between  the  health  management  system  design  and 
the  overall  system  design  is  shown  in  Figure  1 . 


FMECA,  Diagnostic  Metrics 
Virtual  Environment 


Cost/Benefit  Analysis 
Parts  Procurement 


Figure  1 - Health  Management  with  System  Design 

In  this  figure,  health  management  system  design  is  shown  within  the  dotted  line  depicted  as  a 
“Virtual  Environment”.  The  concept  illustrated  allows  the  health  management  system  designer  to 
influence  the  “top  level”  system  design  (shown  as  “machinery”)  and  assess  the  downstream 
availability  and  life  cycle  costs  associated  with  the  “whole”  system  including  its  health 
management.  The  final  availability  and  overall  life  cycle  cost  relationships  must  be  estimated 
based  on  the  potential  designs  offered  and  an  optimization  performed  based  on  the  design  trade- 
offs. 

Because  an  initial  system  FMECA  is  performed  during  the  design  stage,  it  is  a perfect  link  the 
critical  overall  system  failure  modes  and  the  health  management  system  that  is  designed  to  help 
mitigate  these  failure  modes.  Hence,  a process  will  be  demonstrated  that  links  this  traditional 
FMECA  analysis  with  health  management  system  design  optimization  based  on  failure  mode 
coverage,  availability,  and  life  cycle  cost  analyses. 

Role  of  FMECA  in  Health  Management:  Traditional  Failure  Modes,  Effects  and  Criticality 
Analysis  (FMECA)  is  typically  performed  in  conjunction  with  the  design  process'.  FMECA’s 
historically  contain  3 main  pieces  of  information  as  described  below: 


' In  this  case,  “Design"  refers  to  all  aspects  of  the  system  (components,  control,  etc.)  with  the  exception  of  sensors  and  software  used 
for  condition  monitoring. 
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• A list  of  failure  modes  for  a particular  component 

• The  effects  of  if  the  failure  mode  occurred  ranging  from  a local  level  to  the  end  effect 

• The  criticality  of  the  Failure  mode  (I  - IV),  where  (I)  is  the  most  critical 

While  this  type  of  failure  mode  analysis  is  beneficial  in  getting  an  initial  measure  of  system  reliability 
and  identifying  candidates  for  redundancy,  there  are  several  areas  where  fundamental  improvements 
can  be  made  so  that  FMECA’s  can  assist  in  health  monitoring  design.  Four  important  FMECA 
improvements  are  described  next. 

1)  Traditional  FMECA  does  not  address  the  precursors  or  symptoms  to  failure  modes. 

To  move  maintenance  from  reactive  to  proactive,  it  is  important  to  focus  on  both  system  and 
component  level  indications  that  the  likelihood  of  a failure  mode  has  increased.  Failure  mode 
symptoms  that  occur  prior  to  failure  are  these  indications.  An  example  of  failure  mode  symptoms 
associated  with  a bearing  would  be  an  increase  in  spike  energy  or  an  increase  in  the  oil  particulate 
count. 

2)  Traditional  FMECA  does  not  address  the  sensors  and  sensor  placement  requirements  to  observe 
failure  mode  symptoms  or  effects. 

The  right  data  is  essential  to  a health  monitoring  system.  It  is  also  important  to  have  an  optimal  level  of 
failure  mode  coverage  so  that  enough  collaborative  information  is  available  to  detect  and  isolate 
failures.  However,  the  authors’  experiences  have  reinforced  the  fact  that  simply  adding  more  sensors  is 
impractical  and  ultimately  reduces  system  reliability.  By  including  sensors  and  sensor  placement  into 
the  FMECA  analysis,  the  location  of  a particular  sensor  for  the  optimum  observational  quality  becomes 
more  apparent  A simple  example  of  this  sensor  placement  issue  might  be  the  use  of  a downstream 
pressure  sensor,  necessary  for  a control  function,  which  can  also  be  used  to  monitor  performance 
characteristics  of  upstream  components.  Moreover,  in  some  cases,  a simple  change  in  the 
specifications  of  the  sensor  may  provide  monitoring  capability  in  addition  to  the  desired  basic  control 
function.  Increasing  the  dynamic  range  or  bandwidth  of  an  accelerometer  or  pressure  sensor  are  typical 
examples. 

3)  Traditional  FMECA  does  not  address  health  management  technologies  for  diagnosing  and 
prognosing  faults. 

The  natural  extension  of  including  sensors  in  the  FMECA  is  inclusion  of  diagnostic  and  prognostic 
technologies  for  observing  or  predicting  failure  modes  and  effects.  Because  several  different  diagnostic 
and/or  prognostic  technologies  can  be  used  for  detecting  a common  failure  mode,  acquisition  and 
implementation  considerations  must  also  be  examined. 

4)  Traditional  FMECA  typically  focuses  on  subsystems  independently. 

System  level  symptoms  or  system  level  effects  are  not  fully  realizable  because  subsystem  interactions 
are  typically  not  considered.  This  is  a natural  result  of  the  communications  barrier  between  the 
numerous  teams  and  venders  responsible  for  the  development  of  a piece  of  complex  machinery.  As  a 
result,  unnecessary  sensors  or  Health  Management  (HM)  algorithms  may  be  implemented  or  possibly 
overlooked  entirely. 

With  these  shortcomings  in  mind,  a new  approach  has  been  developed  as  an  extension  to  a traditional 
FMECA  that  can  be  used  in  the  design  of  health  monitoring  and  management  systems. 
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Approach  to  Health  Management  Design:  Figure  2 provides  an  overview  of  the  approach  to  health 
management  system  design  optimization.  A basic  description  of  each  block  will  be  given  here,  while 
details  associated  with  each  block  will  follow.  First,  a function  block  diagram  of  the  system  must  be 
created  that  models  the  energy  flow  relationships  between  components.  This  functional  block  diagram 
provides  a clear  vision  of  how  components  interact  with  each  other  across  subsystems.  On  a parallel 
path,  a tabular  FMECA  is  created  that  corresponds  to  a traditional  FMECA  except  it  contains  failure 
mode  symptoms,  as  well  as  sensors  and  diagnostic/prognostic  technologies. 


A Design  Tool  for  Optimizing  Prognostic  Health  Monitoring  (PHM) 
Requirements  Using  Advanced  FMECA  and  Cost/Benefit  Models 


Tshuler  FMECA 


Svttqp  Optimization 

(Rank  HM  fyttecn  Configurations) 
based  on 

•System  AvailafeHty/Mintainibility 
•LCC 

•Technical  Performance 


Figure  2 - Organization  of  the  FMECA++  tool 


The  information  from  the  functional  block  diagram  and  the  tabular  FMECA  is  automatically 
combined  to  create  a graphical  health  management  environment  that  contains  all  of  the  failure 
mode  attributes  as  well  as  health  management  technologies.  Once  the  graphical  health 
management  system  has  been  developed,  attributes  are  assigned  to  the  failure  modes, 
connections,  sensors  and  diagnostic/prognostic  technologies.  The  attributes  are  information  like 
historical  failure  rates,  replacement  costs,  false  alarm  rates  etc.,  which  are  used  to  generate  a 
fitness  function  for  assessing  the  benefits  of  the  health  management  system  configuration.  The 
“fitness”  function  criteria  includes  system  availability,  reliability,  and  cost.  Some  of  these 
attributes  must  be  manually  determined  if  known,  while  others  are  related  to  the  attributes  of  the 
diagnostic/prognostic  technologies  which  can  be  determined  from  independent  measures  of 
performance  and  effectiveness  tests.  Finally,  the  health  management  configuration  is 
automatically  optimized  from  a cost/benefit  and/or  availability  standpoint  using  a genetic 
algorithm  approach.  The  net  result  is  a configuration  that  maintains  the  highest  system  reliability 
to  cost/benefit  ratio. 

Concept  of  the  Functional  Block  Diagram:  The  Function  Block  Diagram  (FBD)  contains  an 
integrated  representation  of  how  components,  subsystems  and  systems  interact  with  one  another.  It  is 
not  a simulation,  only  a hierarchical  map  of  physical  energy  flows  (i.e.  torque  transfer,  current. 
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pressure).  This  energy  flow  map  serves  as  the  backbone  for  the  health  management  design 
environment  because  it  contains  the  failure  mode  symptoms  and  effects  as  well  as  captures  their 
temporal  paths.  Figure  3 shows  an  example  of  a functional  flow  diagram  at  a “system”  level.  One 
could  select  any  of  the  components  to  reveal  specific  interactions  between  its  associated  subsystem 
components.  This  FBD  was  created  with  a DARPA  owned  program  called  GME  developed  by  ISIS 
Inc.  at  Vanderbilt  University  [7].  Other  generic  modeling  software  can  also  be  used  to  build  a FBD. 
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Figure  3 - Functional  Block  Diagram 

As  previously  mentioned,  with  this  approach,  traditional  FMECA  analyses  were  enhanced  with 
the  addition  of  sensors,  health  monitoring  technologies  and  failure  symptoms.  Figure  4 shows  an 
example  of  an  enhanced  FMECA  performed  on  a portion  of  a fuel  system  for  a F-100  engine. 

In  this  example,  as  with  traditional  FMECA,  the  failure  mode  is  provided  along  with  its  effects 
(ranked  from  top  to  bottom  as  primary,  secondary,  tertiary,  etc.).  The  Criticality  or  Frequency  of 
Occurrence  of  the  failure  mode  is  ranked  from  A to  E where: 


A = Frequent,  B = Probable,  C = Occasional,  D = Remote,  E = Improbable 

In  practice,  this  Criticality  letter  would  be  associated  with  a specific  probability  of  failure  range. 

The  Severity  of  the  failure  mode  is  ranked  from  I-IV  where: 

I - Catastrophic,  II  - Critical,  III  - Marginal,  IV  - Negligible 

The  Criticality  and  Severity  are  symptoms  of  a failure  mode  used  in  optimizing  the  health 
management  design  discussed  later. 

In  Figure  4,  the  first  FMECA  enhancement  is  that  failure  mode  symptoms  have  been  added  to  the 
“effects”  column  and  are  shaded  in  blue  (or  light  gray).  Failure  mode  symptoms  are  events  that 
can  be  observed  prior  to  the  failure  mode  occurring  or  when  the  failure  mode  is  in  a very  early 
stage  of  development.  The  effects  that  are  shown  in  yellow  (or  dark  gray)  are  downstream  failure 
modes.  In  the  case  where  an  effect  is  a downstream  failure  mode,  the  failure  mode  of  focus  could 
be  considered  a failure  mode  precursor. 
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The  “Component”  column  identifies  the  component  immediately  affected  by  the  failure  mode 
while  “Module”  is  the  subsystem  in  which  the  component  resides.  This  functional  relationship  is 
cross-referenced  with  the  functional  block  diagram.  In  a similar  fashion,  the  “Sensor”  column 
lists  the  sensor  that  can  observe  the  symptom  or  effect  while  “S_Module”  is  the  subsystem  in 
which  the  sensor  resides  and  “S_Componenf  ’ is  the  component  it  is  linked  to.  All  sensors  in  this 
example  are  required  for  control  or  safety  purposes.  Finally,  “Diagnostics”  and  “Prognostic” 
columns  have  been  added.  The  “Diagnostics”  column  describes  any  discrete  diagnostic  (Built  in 
Test  (BIT))  or  algorithms  that  can  observe  the  symptom  or  effect.  The  “Prognostics”  column 
describes  any  prognostic  algorithms  that  can  be  used  to  obtain  a RUL  prediction  on  the  failure 
mode. 

Graphical  Health  Management  Environement 

The  FBD  and  the  tabular  FMECA  contain  enough  information  to  generate  a graphical  health 
management  design  and  testing  environment  without  any  further  human  intervention.  Figure  5 
provides  a simple  representation  of  the  graphical  health  management  system  model  and  will  be 
used  to  illustrate  the  use  of  collaborative  information  to  predict  and  isolate  faults.  In  this  figure, 
the  “S’s”  represent  sensors  local  to  a component.  Failure  modes  (FM’s)  are  shown  that  originate 
in  this  component  and  their  associated  local  effects.  Downstream  effects  will  propagate  up  to  the 
next  higher  level.  Diagnostic  monitors  and  prognostic  monitors  are  also  present  in  this  model. 
Consider  the  following  example. 

The  diagnostic  monitor  (Dl)  could  identify  that  the  symptoms  of  either  Failure  Mode  1 (FM1)  or 
Failure  Mode  2 (FM2)  have  developed.  If,  in  addition  to  this  observation,  the  prognostic  monitor 
(P)  linked  to  “FM1”  determines  that  “FM1”  has  a high  probability  of  failure,  “FM1”  can  be 
assigned  more  risk  than  “FM2”.  Now  consider  if  “P”  and  “Dl”  did  not  exist.  In  this  scenario, 
there  is  nothing  in  this  health  management  configuration  that  can  predict  “FM1”  or  “FM2”  before 
they  occur.  However,  the  effect  of  “FM1”  is  a symptom  of  “FM3”  and,  in  this  case,  there  is 
potential  that  the  fault  path  could  be  prevented  with  “D2”  before  higher  level  effects  develop. 
Therefore,  if  “FM3”  is  found  to  have  occurred  and  “D2”  did  not  alarm,  “FM2”  would  be  the  more 
likely  root  cause  (accounting  for  the  false-negative  potential  of  the  “S4”/“D2”  combination)  and 
fault  isolation  potential  is  improved. 


Figure  6 - Generic  Graphical  FMECA  Representation 

Health  Management  Attributes:  To  autonomously  evaluate  the  cost/benefit  of  a HM  system 
configuration,  all  aspects  of  the  system  must  ultimately  be  assigned,  or  modify,  a dollar  value.  Some  of 
these  “attributes”  are  more  easily  derived  that  others.  All  attributes  can  be  grouped  into  “Cost  Related” 
or  “Technical  Related”.  Cost  related  attributes  relate  to  true  dollar  values  such  as  hardware  cost  or 
component  replacement  cost  while  some  technical  related  attributes  are  complexity  factor  or  sensor 
observational  quality.  The  FMECA++c  aspects  that  are  assigned  attributes  within  a HM  system 
include: 
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1 . Failure  modes  (FM) 

2.  Sensors  (S) 

3.  Connections  (Sy/E) 

4.  Diagnostics  (D) 

5.  Prognostic  (P) 

Each  of  these  health  management  system  “building  blocks”  that  make  up  the  Integrated  HM 
model  have  “attributes”  that  contribute  to  the  overall  health  management  system  configuration 
cost  function.  A description  of  each  of  these  “building  block”  attributes  is  provided  next. 


Failure  Modes  - Failure  Modes  have  been  assigned  a minimum  of  5 attributes.  These  are: 

1 . Criticality  (A-E)  or  Failure  Rate  (0-1)  — (P0 

2.  Severity  level  (I  - IV)  - (S) 

3.  Consequential  Cost  of  Failure  Mode  occurring  - (CC) 

4.  Cost  of  a False  Detection  (CF) 

5.  Cost  saved  with  Planned  Maintenance  (M) 

The  “Severity”  (S)  is  a multiplier  in  the  cost  function  that  may  represent  the  safety  factor  of  a 
particular  failure  mode.  The  “Consequential  Cost”  (CC)  is  the  sum  of  replacement, 
refurbishment,  maintenance  etc.  costs  for  a particular  failure  mode  as  well.  The  downstream 
effects  of  a failure  mode  are  naturally  accounted  for  in  the  integrated  FMECA++c  model.  The 
“Cost  of  False  Detection”  (CF)  represents  the  cost  of  an  inspection  maintenance  event,  reduced 
availability  etc.  Finally,  the  “Cost  Saved  with  Planned  Maintenance”  (M)  is  the  benefit  realized 
by  being  able  to  predict  when  (with  confidence  bounds)  a failure  will  occur. 

Clearly,  the  failure  mode  attributes  do  not  specifically  address  a number  of  maintenance  related 
and  availability  issues.  A number  of  these  issues  are  introduced  in  a companion  paper. 

Sensors  - Sensors  were  defined  in  the  model  as  components  for  measuring  physical  quantities 
such  as  temperatures,  pressures  and  currents.  The  attributes  assigned  to  the  sensors  include: 

1 . Acquisition  and  Implementation  Cost  (AIC) 

2.  Criticality  (A-E)  or  Failure  Rate  - (SPf) 

3.  Weight  Cost  - (W)  (for  aerospace  applications) 

4.  Observational  Quality  (0- 1 ) - (OQ) 

The  total  “cost”  of  a particular  sensor  is  a function  of  its  utility  in  a variety  of  diagnostic  and 
prognostic  tools  as  well  as  its  role  in  control  system  functionality. 

The  “Observational  Quality”  attribute  of  a particular  sensor  is  a function  of  its  type  and  placement  with 
respect  to  the  failure  mode  being  observed.  The  identification  of  a parsimonious  suite  of  sensors  and 
their  placement  is  a necessary  step  in  the  design  of  a health  management  system  in  order  to  optimize 
the  detection  and  prognostic  capability  of  the  available  sensors.  A number  of  different  approaches  have 
been  investigated  by  the  authors  [1]  to  help  in  the  optimum  sensor  and  placement  in  terms  of  health 
management.  One  method  was  via  a system  test  and  sensitivity  study,  wherein  the  observability  of  the 
identified  failure  mode  symptoms  at  each  potential  sensor  location  was  determined.  Locations  within 
the  system  with  the  largest  overlapping  of  failure  modes  and  the  highest  observability  are  used  to  select 
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potential  locations  for  sensor  placement.  A key  part  of  tins  process  is  a sensitivity  matrix  that  quantifies 
the  observability  of  different  variables  throughout  the  system  for  a set  of  failure  modes. 

Symptom  and  Effect  Connection  Attributes  - Symptom  and  Effect  connections  within  the 
graphical  FMECA  environment  represent  the  causal  and  temporal  links  between  failure  modes 
and  their  effects.  The  only  connection  attribute  is  “Propagation  Probability”  - (Pp)  which  is  the 
likelihood  of  an  effect  propagating  downstream. 

Diagnostic  and  Prognostic  Attributes  - Diagnostics  can  be  either  discrete  or  continuous. 
Discrete  diagnostics  are  traditionally  algorithms  that  produce  0 or  1 depending  on  if  a threshold 
has  been  exceeded.  Many  types  of  Built  In  Tests  (BITs)  can  be  classified  as  Discrete 
Diagnostics.  An  example  of  discrete  diagnostics  is  an  Exhaust  Gas  Temperature  (EGT)  reading 
that  has  exceeded  a predetermined  level. 

Continuous  diagnostics  are  algorithms  designed  to  observe  transitional  effects  and  diagnose  a 
failure  mode  based  on  the  method  and  rate  in  which  the  effect  is  changing.  Continuous 
diagnostics  are  usually  associated  with  observing  the  severity  of  failure  mode  symptoms. 
Examples  of  continuous  diagnostics  would  be  a spike  energy  monitor  for  identifying  low  levels  of 
bearing  race  spalling  or  an  A.I.  classifier  for  diagnosing  that  a valve  is  sticking. 

The  attributes  identified  for  Diagnostics  have  been  broken  up  into  Technical  and  Cost  related. 
The  Technical  attributes  include: 

1)  Detection  Confidence  score  (0-1) -(DC)  2)  % false  positive  score  (0-1) -(FP) 

The  “Detection  Confidence  score”  can  be  used  to  simultaneously  account  for  true-negative  and 
true-positive  characteristics. 

The  Cost  Attribute  of  Diagnostics  include: 

1 . Development,  Implementation  and  Tech.  Maintenance  Cost  (DAIC) 

Finally,  Prognostic  algorithms  can  use  a combination  of  sensor  data,  a-priori  knowledge  of  a 
failure  mode  and  diagnostic  information  to  predict  the  time  to  a failure  or  degraded  condition 
with  confidence  bounds.  Prognostic  algorithms  are  linked  directly  to  failure  modes  in  the 
graphical  FMECA  model.  Like  diagnostic  algorithms,  both  technical  and  cost  related  attributes 
have  been  identified  for  prognostic  algorithms. 

Technical  Attribute: 

1 . Prognostic  Accuracy  (0-1)  - (PA) 

Prognostics  do  not  have  an  attribute  associated  with  false  alarms.  The  “Prognostic  Accuracy” 
accounts  for  the  early  detection  quality  of  the  technology.  A physical  prognostic  model  (i.e. 
based  on  an  FE  model)  would  ideally  have  a higher  prognostic  accuracy  than  an  experienced- 
based  model  (i.e.  Weibull  distributions  of  historical  failure  rates). 

The  Cost  Attribute  for  Prognostics  is: 

1 . Development,  Implementation  and  Tech.  Maintenance  Cost  - (PAIC) 
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A valid  concern  is  how  the  technical  attributes  of  diagnostic  and  prognostics  technologies  can  be 
determined.  One  method  is  addressed  in  [1]  whereby  algorithms  are  tested  objectively  from 
performance  and  effectiveness  standpoints  using  transitional  run  to  failure  data.  Of  course  in  the 
absence  of  this  type  of  information,  and  with  a new  sensor/algorithm  combination,  an  educated 
guess  may  be  the  only  option. 

Health  Management  System  Optimization  - In  order  to  optimize  the  core  configuration  of  a 
health  management  system  (i.e.  what  sensors  and  associated  algorithms  to  implement)  based  on 
the  enhanced  FMECA  approach  previously  described,  a cost  or  fitness  function  that  accounts  for 
reliability,  technical  risk,  complexity  and  overall  life  cycle  costs  must  be  developed.  This  total 
“fitness”  function  will  then  be  minimized  to  arrive  at  potential  HM  system  configurations.  The 
plot  on  the  top  of  Figure  6 shows  system  dependability  as  a function  of  cost  in  the  absence  of  a 
health  monitoring  system.  In  this  scenario,  the  redundancy  and  high  factors  of  safety  are  essential 
to  insure  that  critical  failures  maintain  a low  failure  rate.  [3]  The  lower  plot  illustrates  the  effect 
of  implementing  a HM  system.  With  effective  (and  dependable)  diagnostic  and  prognostic 
capabilities,  system  redundancy  can  be  reduced  and  the  boundaries  of  the  design  envelope  can  be 
safely  extended.  With  health  monitoring  capability,  the  overall  system  dependability  remains 
high  while  safety  is  not  compromised. 


Cost/Benefit  of  a Health  Management  System 


Cost » 


Overall  Cost  Savings 

Figure  6 - Using  HM  to  increase  overall  system  reliability 


The  health  management  design  environment  contains  a sufficient  amount  of  information  to 
generate  and  evaluate  a fitness  function  for  the  configuration.  This  fitness  function  is  of  the  form: 

For  each  Failure  Mode  - FM(i) 

Step  1)  Probability  of  Failure  * Severity  Consequential  CostofFM(i)  +(Downstream 
Failure  Mode  Consequential  Costs)  * Probability  of  Propagation 
Step  2)  *HM  risk  reduction  attributed  to  FM(i) 

Step  3)  + Cost  associated  with  False  Alarms  on  FM(i) 

Step  4)  + Total  Cost  of  all  HM  technology 


Specifically,  the  formulation  is  as  follows: 
Step  1 and  2 = 
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Where  the  cost  saved  with  planned  maintenance  (M)  can  only  be  realized  if  a prognostic 
algorithm  is  present  on  the  failure  mode. 
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Configuration  Optimization:  The  HM  system  optimization  (optimization  of  the  previously  described 
cost  function)  will  operate  between  two  boundaries;  a “maximum”  HM  system  configuration  that 
includes  the  “wish  list”  of  all  potential  sensors  and  associated  algorithms  that  achieve  complete  failure 
mode  coverage  and  a “minimum”  configuration  that  is  necessary  for  safety  and  control.  The 
optimization  algorithm  will  examine  random  configuration  variations  and  calculate  the  “fitness”  or  cost 
for  each. 

A genetic  algorithm  optimization  scheme  was  chosen  for  the  HM  optimization  because  genetic 
algorithms  are  better  configured  to  handle  optimization  problems  with  little  regard  for  non-linearity, 
dimensionality  or  function  complexity  in  general.  Potential  cost  functions  generated  in  the 
FMECA++0  environment  can  include  hundreds  of  independent  variables  and  thus  makes  it  impractical 
to  utilize  traditional  optimization  techniques  such  as  gradient  decent  or  other  derivative-based 
algorithms. 

The  genetic  algorithm  optimization  scheme  developed  capitalizes  on  the  benefits  of  both  the  classic  and 
elite  genetic  algorithm  approaches.  In  general,  the  genetic  algorithm  operates  by  evaluating  the 
“fitness”  of  a “gene  pool”  population  within  a given  environment  New  “generations”  (potential 
solutions)  are  created  using  a combination  of  “parent’  genes  and  “mutations”.  Only  the  most  “fit” 
genes  (best  solutions)  are  ultimately  passed  through  the  generations  [5].  In  terms  of  health 
management  system  design  optimization,  the  “environment’  is  the  FMECA  model  while  the  “gene 
pool”  represents  the  many  different  health  management  configurations. 

The  HM  building  blocks  that  contribute  most  effectively  to  the  minimization  of  the  “fitness”  function 
will  be  passed  on  to  the  “next  generation”.  This  process  is  described  in  the  block  diagram  in  Figure  7. 
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Figure  8 shows  a 2-D  contour  of  a simplified  cost  function  associated  with  two  variables.  Normally  the 
dimensionality  would  be  much  higher  and  equal  to  the  number  of  possible  combinations  between  the 
max.  and  min.  configurations.  This  cost  function  was  chosen  to  illustrate  how  the  genetic  algorithms 
work  because  it  has  three  clear  minimas,  with  only  one  as  the  global  minima  (the  solution  we  are 
looking  for).  An  initial  population  was  generated  that  represents  a small  fraction  of  the  possible  HM 
configurations.  Within  the  optimization  process,  aspects  of  this  population  are  combined,  mutated  and 
re-evaluated. 


G*n*tic  Algorithm  Plot 


Figure  7 - Genetic  Algorithm 
Flow  Chart 


Example  of  HM  design  and  optimization:  Figure  9 shows  the  Maximum  and  Minimum  HM 
configuration  addressing  failure  modes  for  a bearing  and  bearing  housing. 


Figure  9 - Max  and  Min  Configurations 
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Notice  that  in  the  Max.  configuration,  a diagnostic  monitor  is  observing  the  vibration  related 
symptoms  of  Inner  Race  Spalling  and  a prognostic  monitor  is  predicting  when  the  spalling  will 
occur  with  a high  severity  using  motor  current,  and  speed.  If  the  spalling  were  to  occur,  another 
diagnostic  monitor  (D2)  will  observe  if  oil  temperature  is  too  high  and  thus  potentially  prevent 
cracking  of  the  Bearing  Housing  (FM2).  In  the  Minimum  configuration  no  health  monitor 
capability  exist  and  the  speed  sensor  is  present  for  control  purposes  only.  If  bearing  spalling  were 
to  occur  the  risk  of  the  housing  cracking  would  be  based  entirely  on  the  Propagation  Probability 
between  FM1  and  FM2.  The  attributes  assigned  to  each  of  the  HM  components  in  the  Max. 
configuration  are  given  in  Figures  10  through  12. 


Failure  Mode  Attributes 


Diagnostic  Attributes 


1.  Failure  Rate 

2.  Severity  level  (1-4) 

3 . Con  sequent!  al  C ost  of  Fa  ilure 

4.  Cost  of  False  Detection 

5.  Cost  saved  with  planned  Maintenance 


1.  = IE -2 

2. -3  (marginal) 

3.  - 568000 

4.  = 57000 

5. =  52000 


1.  « 8E-3 

2.  = 4 (critical) 

3.  = 5350000 

4.  = S9000 

5.  = 512000 


1 . Detection  Confidence  (0-1) 

2.  V*  false  positive  (0-1) 

1 . Development  and  Acquisition  and  Tech.  Main.  Cost 


> 

> 


Technical  Attributes 
Cost  Attribute 


TA  CA 

1.  - 0.75  1.  = 55 000 

2.  - 0.2 


TA  CA 

1.  * 0.95  1.  - $1000 

2.  - 0.05 


Prognostic  Attributes 


Symptom  and  Effect  Attributes 


| Syl  | Propagation  Prob.=l  | El  j Propagation Prob.  =0.75 


1 Prognostic  Accuracy  (0-1) 

1 Development  and  Acquisition 
and  Tech  Main  Cost 


TA  CA 

1.  = 0.85  1.  = $20000 


Figure  10  - Failure  Mode  and 
S/E  attributes 


Figure  12  - D/P  attributes 


Optimal  Configuration 


Sensor  Attributes 

Speed 

1 . Acquisition  and  Implementation  Cost  (Si) 

2.  Failure  Rate  (0-1) 

3.  Weight  Cost  1.  = 5500 

4.  Observational  Quality  (0-1) 

3.  = $100 

4.  = 0.3 


Motor  Current  Vibration  Oil  Temp. 

(§)  (§)  (g) 

1.  = 5200  1.  = 5700  1.  = 5200 

2.  = 3E-3  2.  = 2E-3  2.  = 4E-5 

3.  = $100  3.  = $100  3.  = $200 

4.  = 0.6  4.  = 0.9  4.  = 0.95 


Figure  11  - Sensor  Attributes 


Oil  Temp. 


Cost 

Savings 

Minimum 

22,240 

- 

Maximum 

35.641 

-13.401 

Optimal 

13.981 

8,259 

Figure  13  - Optimal 
Configuration  and  Results 


The  results  of  a cost/benefit  analysis  of  the  Min  and  Max  configurations  is  shown  in  Figure  13. 
In  the  Minimum  configuration  there  is  no  benefit  in  terms  of  risk  reduction  from  a HM  system 
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but  there  is  also  no  added  cost  for  false  alarms  and  HM  hardware.  The  cost  of  22,240  is  the  dollar 
value  calculated  for  risk  of  both  FM1  and  FM2  occurring.  In  contrast,  the  maximum 
configuration  has  too  much  HM  capability.  The  risk  reduction  of  FM1  (calculated  at  78%)  and 
FM2  (10%)  is  not  sufficient  to  offset  the  higher  risk  of  false  alarms  and  the  significant 
technological  development  cost  of  prognostics  in  this  case.  The  optimal  configuration  was  found 
to  retain  both  the  vibration  diagnostic  monitor  and  oil  temp  monitor.  They  provided  a fair 
amount  of  risk  reduction  (40%  and  10%  respectively)  while  maintaining  good  system  reliability. 
Further  optimization  approaches  that  account  for  maintenance  plans  and  system  availability  may 
be  found  in  [1 1], 


Conclusion:  An  approach  has  been  presented  that  extends  traditional  FMECA  capabilities  to  aid  in 
the  design  of  health  management  solutions  that  can  for  reduce  total  ownership  costs  and  improve 
availability  for  complex  engineered  systems.  This  approach  utilizes  a graphical  FMECA  environment 
where  failure  modes,  failure  mode  symptoms/effects,  sensors,  and  diagnostic/prognostic  technologies 
are  represented.  The  health  management  system  configuration  can  be  optimized  from  an  availability 
and  cost/benefit  standpoint  with  a genetic  algorithm  approach  through  analysis  of  the  fitness  attributes 
on  HM  system  building  blocks.  The  ultimate  objective  of  this  approach  was  to  form  a methodology 
and  environment  which  aids  condition  based  maintenance  practices  by  mitigating  or  preventing  failure 
modes  while  still  keeping  sensor  and  diagnostic/prognostic  technology  costs  at  a minimum. 
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